Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Use multi-tensor zeroing for resetting grads #19894

Merged
merged 1 commit into from
Feb 15, 2021

Conversation

MoisesHer
Copy link
Contributor

Description

Use multi-tensor kernel strategy for resetting gradients.
Undo #16716

Checklist

Essentials

Changes

  • in zero_grad method, use ndarray.reset_arrays for resetting multiple arrays within same kernel

Comments

Performance observed training BERT-large on single V100 GPU:

........................................................................Throughput samples / s
BatchSize - BatchAccumulation............Pre-Change............This-PR (multi-tensor )............ Improvement(%)
................4 - 2 .............................................. 29.82 ..........................36.24.............................................. 21.5
................4 - 4............................................... 36.89............................41.45..............................................12.3
................8 - 4............................................... 45.25............................48.41.............................................. 6.9

@MoisesHer MoisesHer requested a review from szha as a code owner February 12, 2021 23:46
@mxnet-bot
Copy link

Hey @MoisesHer , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [unix-gpu, sanity, edge, miscellaneous, unix-cpu, centos-cpu, clang, windows-cpu, centos-gpu, windows-gpu, website]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@lanking520 lanking520 added the pr-awaiting-testing PR is reviewed and waiting CI build and test label Feb 12, 2021
Copy link
Member

@sxjscience sxjscience left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, we may move the revision of the reset_arrays to a future PR.

@lanking520 lanking520 added pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress labels Feb 13, 2021
@leezu leezu merged commit da24765 into apache:master Feb 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-merge Review and CI is complete. Ready to Merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants